-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: header downloading get stuck #4459
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic, thank you!
@@ -822,7 +822,7 @@ async fn sync_headers_in_reverse<DB: Blockstore + Sync + Send + 'static>( | |||
|
|||
#[allow(deprecated)] // Tracking issue: https://github.com/ChainSafe/forest/issues/3157 | |||
let wp = WithProgressRaw::new("Downloading headers", total_size as u64); | |||
while pending_tipsets.last().epoch() > until_epoch { | |||
'outer: while pending_tipsets.last().epoch() > until_epoch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the until_epoch
is null, it relies on break 'outer;
to break this loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we avoid the inner/outer loop complexity by refactoring this to smaller functions that could potentially be unit tested? At least the block for tipset in network_tipsets {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid the 'outer
label, how about this
// breaks the loop when `until_epoch` has been added to `pending_tipsets`
// or when `until_epoch` is null. Note that in the latter case, the outer
// while condition is always true, and it relies on this flag to break
// the loop
let should_terminate = 'append_tipsets: {
for tipset in network_tipsets {
// This could happen when the `until_epoch` is a null epoch
if tipset.epoch() < until_epoch {
break 'append_tipsets true;
}
validate_tipset_against_cache(bad_block_cache, tipset.key(), &accepted_blocks)?;
accepted_blocks.extend(tipset.cids());
tracker.write().set_epoch(tipset.epoch());
pending_tipsets.push(tipset);
}
false
};
if should_terminate {
break;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's up to you, but I'd still recommend extracting the loop to a separate function. This makes it possible to test and is subjectively easier for the reader of the parent function to grasp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this then? It still highly relies on the comments to explain this piece of code though
// tipsets is sorted by epoch in descending order
// returns true when `until_epoch_inclusive` is overreached
fn for_each_tipset_until_epoch_overreached(
tipsets: impl IntoIterator<Item = Arc<Tipset>>,
until_epoch_inclusive: ChainEpoch,
mut callback: impl FnMut(Arc<Tipset>) -> Result<(), TipsetRangeSyncerError>,
) -> Result<bool, TipsetRangeSyncerError> {
for tipset in tipsets {
if tipset.epoch() < until_epoch_inclusive {
return Ok(true);
}
callback(tipset)?;
}
Ok(false)
}
let callback = |tipset: Arc<Tipset>| {
validate_tipset_against_cache(bad_block_cache, tipset.key(), &accepted_blocks)?;
accepted_blocks.extend(tipset.cids());
tracker.write().set_epoch(tipset.epoch());
pending_tipsets.push(tipset);
Ok(())
};
// Breaks the loop when `until_epoch` is overreached, which happens
// when there are null tipsets in the queried range.
// Note that when the `until_epoch` is null, the outer while condition
// is always true, and it relies on this flag to break the loop
if for_each_tipset_until_epoch_overreached(network_tipsets, until_epoch, callback)? {
// Breaks when the `until_epoch` is overreached.
break;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this has been queued, I will make this refactor in a separate PR and let's discuss there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Summary of changes
Root cause: when the local head is followed by a null Tipset,
sync_headers_in_reverse
runs into a dead loope.g. When the local head is at epoch
1742466
which is followed by a null tipset at epoch1742467
, the last queried tipsets innetwork_tipsets
are at epoch [1742466, ..] which are all filtered out, and lead the loop into a dead one.This is a regression introduced in #4291
In lotus, it breaks the outer loop directly with the label
Changes introduced in this pull request:
until_epoch
Reference issue to close (if applicable)
Closes #4441
Other information and links
Change checklist